Concept-based semantic annotation, indexing and retrieval of office-like document units

نویسندگان

  • Sasa Nesic
  • Fabio Crestani
  • Mehdi Jazayeri
  • Dragan Gasevic
چکیده

We present an ontology-driven approach to semantic annotation, indexing and retrieval of document units. This approach is based on a novel semantic document model (SDM) that we developed to make office-like document units be uniquely identified, semantically annotated with concepts from annotation ontologies and linkable across document boundaries. In the semantic annotation model that we propose, we first lexically expand descriptions of ontological concepts to enhance syntactic matching. Next, we expand a set of syntactic matches with semantically related concepts (i.e., semantic matches) discovered by exploring the annotation ontology. Moreover, we calculate the annotation weight of both the syntactic and semantic matches by taking into account the effects of the lexical expansion and measuring semantic distance between ontological concepts. The retrieval model of document units utilizes the inverted concept index that we generate from the concepts used in the annotation and their weights for document units they annotate. Results of the preliminary evaluation conducted with a prototype implementation are promising. We present the analysis of these results. Report Info Published January 2010 Number USI-INF-TR-2010-1 Institution Faculty of Informatics Università della Svizzera italiana Lugano, Switzerland Online Access www.inf.usi.ch/techreports

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fuzzy C-Means Clustering for Biomedical Documents Using Ontology Based Indexing and Semantic Annotation

Search is the most obvious application of information retrieval. The variety of widely obtainable biomedical data is enormous and is expanding fast. This expansion makes the existing techniques are not enough to extract the most interesting patterns from the collection as per the user requirement. Recent researches are concentrating more on semantic based searching than the traditional term bas...

متن کامل

Search and Navigation in Semantically Integrated Document Collections

The paper presents a novel approach to semantic search and navigation in office-like document collections. The approach is based on a semantic document model that we have developed to enable unique identification, semantic annotation, and semantic linking of document units of officelike documents. In order to semantically annotate document units and to link semantically related document units, ...

متن کامل

تأملاتی بر نمایه‌ سازی تصاویر: یک تصویر ارزشی برابر با هزار واژه

Purpose: This paper presents various  image indexing techniques and discusses their advantages and limitations.             Methodology: conducting a review of the literature review, it identifies three main image indexing techniques, namely concept-based image indexing, content-based image indexing and folksonomy. It then describes each technique. Findings: Concept-based image indexing is te...

متن کامل

Using latent semantic indexing for morph-based spoken document retrieval

Previously, phone-based and word-based approaches have been used for spoken document retrieval. The former suffers from high error rates and the latter from limited vocabulary of the recognizer. Our method relies on unlimited vocabulary continuous speech recognizer that uses morpheme-like units discovered in an unsupervised manner. The morpheme-like units, or “morphs” for short, have been succe...

متن کامل

XML and Knowledge Technologies for Semantic-Based Indexing of Paper Documents

Effective daily processing of large amounts of paper documents in office environments requires the application of semantic-based indexing techniques during the transformation of paper documents to electronic format. For this purpose a combination of both XML and knowledge technologies can be used. XML distinguishes between data, its structure and semantics, allowing the exchange of data element...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010